Random vector generation of a semantic space

نویسندگان

  • Jean-François Delpech
  • Sabine Ploux
چکیده

We show how random vectors and random projecon can be implemented in the usual vector space model to construct a Euclidean semanc space from a French synonym diconary. We evaluate theorecally the resulng noise and show the experimental distribuon of the similaries of terms in a neighborhood according to the choice of parameters. We also show that the Schmidt orthogonalizaon process is applicable and can be used to separate homonyms with disnct semanc meanings. Neighboring terms are easily arranged into semancally significant clusters which are well suited to the generaon of realisc lists of synonyms and to such applicaons as word selecon for automac text generaon. This process, applicable to any language, can easily be extended to collocaons, is extremely fast and can be updated in real me, whenever new synonyms are proposed. 1. Introducon In their seminal work, Ploux and Victorri 1 have used synonymy relaons deduced from French electronic diconaries to create semanc spaces around French words and their neighbors. Their definion of " synonymy " is fairly broad and includes hyponymy (moineau and oiseau), hyperonymy (arme and pistolet) or even non‐synonymous, but related terms (autocar and automobile); however, in their work, true synonyms (i.e. terms which are more or less interchangeable) form cliques of the graph of synonyms, i.e. maximally complete subgraphs. While this is very interesng from a theorecal standpoint, as it then becomes straigh跀í°€orward to evaluate an interclique distance (or degree of separaon) between any two terms in the graph (as long as neither belongs to an island, such as lapereau and lapinot), it is not very useful in pracce. For example, an author in search of the right term may well not be interested in strict synonyms; terms with related or even opposed meanings can oen be preferable in rhetorical figures. Also, in many applicaons such as automac text generaon, a well‐defined and mathemacally well behaved semanc distance between terms is oen a prerequisite. In this report, we show how an Euclidean semanc distance can quickly and easily be constructed from Ploux and Victorri's database (which contains 54,685 terms and 116,694 cliques). Since the pioneering work of Salton 2 , 3 , it is well understood that any combinaon of terms, such as a clique, can be seen as a vector in a space where each dimension represents a disnct term (or lemma.) (1) This representaon is extremely frui跀í°€ul and forms the basis of …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing Semantic Similarity of Documents Based on Semantic Tensors

Exploiting semantic content of texts due to its wide range of applications such as finding related documents to a query, document classification and computing semantic similarity of documents has always been an important and challenging issue in Natural Language Processing. In this paper, using Wikipedia corpus and organizing it by three-dimensional tensor structure, a novel corpus-based approa...

متن کامل

Measuring semantic relatedness with vector space models and random walks

Both vector space models and graph randomwalk models can be used to determine similarity between concepts. Noting that vectors can be regarded as local views of a graph, we directly compare vector space models and graph random walk models on standard tasks of predicting human similarity ratings, concept categorization, and semantic priming, varying the size of the dataset from which vector spac...

متن کامل

Space Vector Pulse Width Modulation with Reduced Common Mode Voltage and Current Losses for Six-Phase Induction Motor Drive with Three-Level Inverter

Common-mode voltage (CMV) generated by the inverter causes motor bearing failures in multiphase drives.On the other hand, presence of undesired z-component currents in six-phase induction machine (SPIM) leads to extra current losses and have to be considered in pulse width modulation (PWM) techniques. In this paper, it is shown that the presence of z-component currents and CMV in six phase driv...

متن کامل

VHR Semantic Labeling by Random Forest Classification and Fusion of Spectral and Spatial Features on Google Earth Engine

Semantic labeling is an active field in remote sensing applications. Although handling high detailed objects in Very High Resolution (VHR) optical image and VHR Digital Surface Model (DSM) is a challenging task, it can improve the accuracy of semantic labeling methods. In this paper, a semantic labeling method is proposed by fusion of optical and normalized DSM data. Spectral and spatial featur...

متن کامل

Generating an Indoor space routing graph using semantic-geometric method

The development of indoor Location-Based Services faces various challenges that one of which is the method of generating indoor routing graph. Due to the weaknesses of purely geometric methods for generating indoor routing graphs, a semantic-geometric method is proposed to cover the existing gaps in combining the semantic and geometric methods in this study. The proposed method uses the CityGML...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1703.02031  شماره 

صفحات  -

تاریخ انتشار 2017